An Experimental Study on Dynamic Features of Speech Structure
نویسندگان
چکیده
One of the biggest difficulties in automatic speech recognition (ASR) is how to deal with variations of speech signals caused by non-linguistic information, such as age, gender, etc. Various methods have been proposed to compensate for the variations and one of them is speech structure [1]. Speech structure, which extracts only contrastive features and discards absolute features, is proved to be transform-invariant mathematically and to be very robust with the non-linguistic variations experimentally [2]. Although the conventional speech structure extracts local and distant contrastive features, it did not extract dynamic features explicitly which are supposed to exist in the contrastive features. In this paper, we reformulate speech structure based on trajectory HMM and derive trajectory structure (TSR), in which dynamic and contrastive features can be defined and used in ASR. We carry out an experiment of n-best rescoring of isolated word recognition using trajectory structure and obtain 28.5% relative decrease in word error rate.
منابع مشابه
Speech Emotion Recognition Using Scalogram Based Deep Structure
Speech Emotion Recognition (SER) is an important part of speech-based Human-Computer Interface (HCI) applications. Previous SER methods rely on the extraction of features and training an appropriate classifier. However, most of those features can be affected by emotionally irrelevant factors such as gender, speaking styles and environment. Here, an SER method has been proposed based on a concat...
متن کاملبهبود عملکرد سیستم بازشناسی گفتار پیوسته بوسیله ویژگیهای استخراج شده از مانیفولدهای گفتاری در فضای بازسازی شده فاز
The design for new feature extraction methods out of the speech signal and combination of their obtained information is one of the most effective approaches to improve the performance of automatic speech recognition (ASR) system. Recent researches have been shown that the speech signal contains nonlinear and chaotic properties, but the effects of these properties are not used in the continuous ...
متن کاملAn Information-Theoretic Discussion of Convolutional Bottleneck Features for Robust Speech Recognition
Convolutional Neural Networks (CNNs) have been shown their performance in speech recognition systems for extracting features, and also acoustic modeling. In addition, CNNs have been used for robust speech recognition and competitive results have been reported. Convolutive Bottleneck Network (CBN) is a kind of CNNs which has a bottleneck layer among its fully connected layers. The bottleneck fea...
متن کاملThe effect of bilateral subthalamic nucleus deep brain stimulation (STN-DBS) on the acoustic and prosodic features in patients with Parkinson’s disease: A study protocol for the first trial on Iranian patients
Background: The effect of subthalamic nucleus deep brain stimulation (STN-DBS) on the voice features in Parkinson’s disease (PD) is controversial. No study has evaluated the voice features of PD underwent STN-DBS by the acoustic, perceptual, and patient-based assessments comprehensively. Furthermore, there is no study to investigate prosodic features before and after DBS in PD. The curren...
متن کاملA Study of the Features and Functions of speech Perseverance (With an Emphasis on the Alavi Teachings)
The serious challenge that contemporary human is encountered with has been brought about by the lack of applying ethical and behavioral necessities in his life rather than by the weakness of the rules or lack of technology. One of the mentioned important necessities is the factor of speech perseverance which has a particular conceptual and meaningful weight that is the adducing of the right spe...
متن کامل